feat: cross-endpoint routing for serverless functions#129
Merged
Conversation
Add DirectoryClient to query mothership endpoint directory with: - Retry logic with exponential backoff (3 attempts) - Configurable timeout (10s default) - Proper error handling and logging - Async context manager support - Connection pooling via httpx
Add ServiceRegistry to manage manifest loading, directory queries, and routing decisions with: - Manifest loading from file, env var, or auto-detection - On-demand directory loading via DirectoryClient with caching - Cache TTL support (300s default, configurable) - Function routing decisions (local vs remote) - Resource and function metadata access - Graceful degradation if directory unavailable
Add HTTP client for cross-endpoint function execution with: - Async/sync job submission to RunPod endpoints - Async job polling with configurable intervals and timeouts - Cloudpickle serialization/deserialization of arguments - Authentication via RUNPOD_API_KEY header - Error handling and response format handling - Connection pooling via httpx.AsyncClient - Async context manager support
Add routing wrapper that intercepts stub execution and determines if function calls should be executed locally or routed to remote endpoints with: - Function routing decision based on ServiceRegistry - Automatic directory loading before routing decisions - Remote execution via HTTP with proper payload construction - Class method execution support - Error handling and logging - Singleton factory pattern for component reuse
Add ProductionWrapper injection to stubs/registry.py to enable cross-endpoint routing for LiveServerless and CpuLiveServerless resources. - Check for RUNPOD_ENDPOINT_ID environment variable (production mode indicator) - Create and inject wrapper around both stubbed_resource and execute_class_method - Preserve original behavior when not in production - Graceful fallback if ProductionWrapper import fails - No changes to public API or user-facing behavior This enables transparent cross-endpoint function routing while maintaining full backward compatibility.
Add comprehensive integration tests covering the full routing flow: - Local function execution (no remote call) - Remote function execution via HTTP - On-demand directory loading - Error propagation from remote endpoints - Factory creates complete integrated system These tests validate the entire stack from ServiceRegistry → ProductionWrapper → CrossEndpointClient → HTTP execution, ensuring all components work together.
- Remove CrossEndpointClient HTTP client duplication (~250 lines eliminated) - Add get_resource_for_function() to ServiceRegistry that returns ServerlessResource - Modify ProductionWrapper to use ServerlessResource.run_sync() for remote execution - Delete http_client.py and test_http_client.py (replaced by ServerlessResource) - Update ProductionWrapper tests to mock ServerlessResource instead of HTTP client - Add unit tests for get_resource_for_function() in ServiceRegistry tests - Update integration tests to mock ServerlessResource - Simplify ServerlessResource import (no circular dependency) - All 405 tests pass with 65% coverage
Remove unnecessary lazy imports from inside fixture - there's no circular dependency issue. ResourceManager and SingletonMixin don't create circular imports when imported at module level.
…ements Add comprehensive lessons learned from the recent refactoring session: - Add async thread safety pattern to Async Best Practices section - Add custom exception hierarchies to Error Handling section - Expand anti-patterns with URL parsing and unreachable code examples - Add mock alignment lesson to Testing Requirements - Create new Configuration Patterns section for constant centralization These lessons reflect improvements made to the cross-endpoint routing feature: - Thread-safe async cache with asyncio.Lock - Custom exception hierarchy (RuntimeError → RemoteExecutionError, SerializationError) - Robust URL parsing with urllib.parse.urlparse - Centralized configuration in config.py module - Test mock alignment with actual API contracts All examples use consistent GOOD / BAD pattern for clarity.
Add custom exception hierarchy and centralized configuration: - Create exceptions.py with RuntimeError base and domain-specific exceptions - Create config.py with centralized constants (timeouts, retries, cache TTL) - Add asyncio.Lock for thread-safe directory cache in ServiceRegistry - Improve URL parsing with urllib.parse.urlparse and validation - Fix JobOutput API mismatch: check error field instead of success attribute - Add serialization error handling with custom SerializationError - Improve type hints across runtime modules - Update tests to align with actual API contracts
Contributor
There was a problem hiding this comment.
Pull request overview
This PR implements cross-endpoint routing for serverless functions, enabling functions to seamlessly execute locally or remotely based on service discovery configuration. The implementation adds a service discovery layer that queries a mothership directory service to find endpoint URLs and routes function calls accordingly.
Key Changes:
- New runtime module with HTTP client for mothership directory service
- Service registry that loads manifests and performs function-to-endpoint routing
- Production wrapper that intercepts stub calls and routes to local or remote endpoints
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/tetra_rp/runtime/__init__.py |
New runtime module initialization |
src/tetra_rp/runtime/config.py |
Centralized configuration constants for HTTP client and caching |
src/tetra_rp/runtime/exceptions.py |
Custom exception hierarchy for runtime errors |
src/tetra_rp/runtime/directory_client.py |
HTTP client implementation for mothership directory API |
src/tetra_rp/runtime/service_registry.py |
Service discovery and routing logic with manifest loading |
src/tetra_rp/runtime/production_wrapper.py |
Execution router that wraps stub calls and handles remote execution |
src/tetra_rp/stubs/registry.py |
Integration with existing stub layer via wrapper injection |
tests/conftest.py |
Moved imports to top for better organization |
tests/unit/runtime/test_directory_client.py |
Unit tests for directory client HTTP operations |
tests/unit/runtime/test_service_registry.py |
Unit tests for service registry routing logic |
tests/unit/runtime/test_production_wrapper.py |
Unit tests for production wrapper execution routing |
tests/integration/test_cross_endpoint_routing.py |
Integration tests for complete routing flow |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… builtin The custom RuntimeError class in runtime.exceptions was shadowing Python's built-in RuntimeError, creating ambiguity. Renamed to FlashRuntimeError as the base exception class for all cross-endpoint runtime errors. Derived exceptions (RemoteExecutionError, SerializationError, ManifestError, DirectoryUnavailableError) now inherit from FlashRuntimeError. Addresses Copilot review feedback on PR #129.
Add detailed documentation for PR #129 covering: User Guide: - Quick start with manifest and environment setup - Configuration guide with manifest structure explanation - Usage patterns for microservice architecture, mixed local/remote, and fallback scenarios - Error handling and serialization guidelines Contributor Guide: - Architecture overview with data flow diagrams - Core component documentation (ProductionWrapper, ServiceRegistry, DirectoryClient, Exceptions) - Integration points with stub layer and ResourceManager - Design decision rationale - Extension points for serialization, directory backends, and routing policies - Testing strategy and debugging approaches Documentation is verified against actual code implementation with: - Correct manifest format (function_registry + resources structure) - Accurate method names and signatures - Proper exception hierarchy (FlashRuntimeError base class) - Correct HTTP library (httpx, not aiohttp) - Accurate configuration constants and defaults
jhcipar
approved these changes
Jan 8, 2026
Create new src/tetra_rp/runtime/serialization.py with reusable functions for cloudpickle + base64 encoding/decoding to eliminate duplication across 6 production files: - serialize_arg(), serialize_args(), serialize_kwargs() - deserialize_arg(), deserialize_args(), deserialize_kwargs() This addresses the PR #129 comment to refactor duplicated serialization code. All serialization now goes through a single, consistent interface with proper error handling via SerializationError. Updated files: - production_wrapper.py: Use serialize_args/kwargs - live_serverless.py: Use serialize_args/kwargs - execute_class.py: Use serialize_args/kwargs for constructor and method args - generic_handler.py: Use deserialize/serialize utilities - lb_handler.py: Use deserialize/serialize for /execute endpoint - load_balancer_sls.py: Use serialize/deserialize for HTTP-based stub All 581 tests passing. Code coverage: 65.37%.
Rename DirectoryClient to ManifestClient to better reflect its purpose as the manifest directory service (endpoint registry) rather than a generic directory. This addresses PR #129 comment regarding naming clarity. Changes: - Rename src/tetra_rp/runtime/directory_client.py to manifest_client.py - Rename class DirectoryClient -> ManifestClient - Rename exception DirectoryUnavailableError -> ManifestServiceUnavailableError - Update all imports and references in: - service_registry.py - exceptions.py - All test files (test_manifest_client.py, test_service_registry.py, test_cross_endpoint_routing.py) The manifest directory service fetches an endpoint registry that maps resource_config names to their deployment URLs from the mothership API. All 581 tests passing. Code coverage: 65.37%.
Create new src/tetra_rp/runtime/models.py with Pydantic-inspired dataclasses: - FunctionMetadata: Function definition with name, module, async status, HTTP routing - ResourceConfig: Resource configuration with type, handler, and functions - Manifest: Top-level manifest with version, project name, function registry, resources This addresses the PR #129 comment to improve manifest type safety and IDE support. Changes: - ServiceRegistry now loads manifests into Manifest objects - Maintains backward compatibility with dict-based manifests in handler generators - Updated get_all_resources() and get_resource_functions() to convert to dicts - Updated HandlerGenerator and LBHandlerGenerator to work with both dict and Manifest - Updated test fixtures to use attribute access instead of dict access Manifest.to_dict() allows serialization to JSON, and Manifest.from_dict() allows deserialization from JSON. All 581 tests passing. Code coverage: 65.68%.
Merged
This was referenced Feb 6, 2026
Closed
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implement cross-endpoint routing to enable serverless functions to call functions deployed on different endpoints. Functions can now seamlessly execute locally or remotely based on service discovery configuration.
Implementation
Core Components
Key Features
Code Quality
Changes
src/tetra_rp/runtime/directory_client.py- New HTTP client for mothership APIsrc/tetra_rp/runtime/service_registry.py- New service discovery layersrc/tetra_rp/runtime/production_wrapper.py- New execution routersrc/tetra_rp/runtime/config.py- Centralized configurationsrc/tetra_rp/runtime/exceptions.py- Custom exception hierarchysrc/tetra_rp/stubs/registry.py- Integration with stub layerArchitecture
Functions are routed based on a manifest that maps function names to resource configurations. The service registry queries the mothership directory to find endpoint URLs, then the production wrapper decides whether to execute locally or create a remote HTTP call to another endpoint.